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We present ML4PG - a machine learning extension for Proof General. It allows one to gather proof 
statistics related to shapes of goals, sequences of applied tactics, and proof tree structures from 
the libraries of interactive higher-order proofs written in Coq and SSReflect. The gathered data 
is clustered using the state-of-the-art machine learning algorithms available in Matlab and Weka. 
ML4PG provides automated interfacing between Proof General and MatlabAVeka. The results of 
clustering are used by ML4PG to provide proof hints in the process of interactive proof development. 
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1 Introduction 

Over the last few decades, theorem proving has seen major developments. Automated (first-order) theo- 
rem provers (ATPs) (e.g. E |41 1, Vampire [40] and SPASS |46|) and SAT/SMT solvers (e.g. CVC3 ||5l, 
Yices ifTTi and Z3 1381 ) are becoming increasingly fast and efficient ||33]| . Interactive (higher-order) 
theorem provers (ITPs) (e.g. Coq Oil, Isabelle/HOL [39], AGDA HOl, Matita O and Mizar fM ) 
have been enriched with dependent types, (co)inductive types, type classes and now provide a very rich 
programming enviroimient 1 18l l42ll22ll30ll . 

The main conceptual difference between ATPs and ITPs lies in the styles of proof development: for 
ATPs, the proof process is primarily an automatically performed proof search, for ITPs - it is mainly 
user-driven proof development. As ITPs work with higher-order logic and type theory Q, where many 
algorithms and procedures are inherently undecidable, they require guidance from the user Nevertheless, 
ITPs have seen major advances in proof automation |[T9l l24l |36]| . One particular trend is to re-enforce 
proof automation in ITPs by employing state-of-the-art tools from ATPs UlIMl. SAT/SMT solvers ||9l[T] 
|26]| or Computer Algebra systems li24H 7ll32J. One major success of this approach is Sledgehammer ||36l : 
it offers Isabelle/HOL users an option to call for an ATP/SMT-generated solution |9|. 

Integrating ITPs with ATPs requires a lot of research into methods of interfacing. Namely, the major 
challenge is a sound and reliable translation between inherently different languages |[37ll24ll32l [TI. This 
especially concerns interpreting outputs from ATPs back into the higher-order environment |[I1[371 , which 
we will also call here backwards interfacing. For example, Sledgehammer uses the results provided by 
external tools to guide the higher-order proofs, but leaves it to the Isabelle/HOL kernel to check that the 
suggested tactic combination is valid. 
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In parallel to the work mentioned above, another trend of research has been developed. It approaches 
the issue of improving proof automation from the perspective of statistical and machine learning methods. 
Several aspects of automated and interactive theorem proving can be data-mined: 

• proof heuristics can be data-mined to improve proof search in ATPs llT4l [T5l l29l 1311 [34l l43l l45l l44ll ; 

• history of successful and unsuccessful proof attempts can be used to inform interactive proof de- 
velopment in ITPs IIT61I3T1 . 

The former trend has been more successful, as first-order automated proofs tend to have more regular 
structure. In the case of higher-order interactive proofs, there are four main issues that make statistical 
data-mining challenging: 

C.l. The richer language reduces the chance of finding regularities and proof patterns by data-mining 
the syntax alone. ITP-based proofs involve an unlimited variety of structures and proof patterns, 
in comparison to ATPs, where resolution or rewriting may be the two possible rules to apply. 
Moreover, in ITPs, one and the same goal may have a range of different proofs, whereas different 
goals can be proven by the same sequence of tactics. Hence, finding statistically significant proof 
features becomes challenging. 

C.2. The notions of a proof may be regarded from different perspectives in ITPs: it may be seen as 
a transition between the subgoals [il51l45ll44ll . a combination of applied tactics ltT6ll . or — more 
traditionally - a proof tree showing the overall proof strategy [31 1. Depending on the nature of 
the proof and application areas of the machine learning tools, each of the three aspects can be 
important for statistical proof pattern recognition. 

C.3. Backwards interfacing - interpreting results provided by the statistical machine learning tool back 
into the higher-order interactive prover - can be a challenge. 

C.4. In interactive proofs, the most time-consuming and challenging part is no longer the time the prover 
takes to find the proof. It is the time the proof developer takes to understand and guide the proof. 
Therefore, when data-mining interactive proofs, we are interested not only in the final result - the 
successful proof, but also in the proof process, including failed and discarded derivation steps. We 
want machine learning to guide the process, not to diagnose or speed up already found proofs. For 
this, machine learning tools for ITPs need to be interactive. 

Up to now, experiments on data-mining interactive proofs were always constrained by the lack of the 
interactive interfacing between machine learning algorithms and the user-driven proof development. For 
example, in f29l, there was a tool that gathered statistics, but no automated data-mining tools were used; 
in |31 1, there was a feature extraction method to data-mine proofs but it was not connected to efficient 
statistics gathering; in [16], these two were semi-automated. 

Because of the inherently interactive nature of proofs in ITPs, user interfaces for ITPs play an impor- 
tant role in proof development. For our experiments, we chose Proof General [^ - a general-purpose, 
emacs-based interface for a range of higher-order theorem provers, e.g. Isabelle, Coq or Lego. Among 
them, we have chosen Coq [ 12| and its SSRefiect library |[T9l for our experiments. Although both built 
upon the same language - Calculus of Inductive Constructions |[T3l . they have distinct proof styles, 
analysis of which plays a special role in this paper, see Section [3] 

This trend of maintaining a strong, convenient interface for a range of proof systems is mirrored by 
a similar trend in the machine learning community. As statistical methods require users to constantly 
interpret and monitor results computed by the statistical tools, the community has developed uniform 
interfaces - convenient environments in which the user can choose which machine learning algorithm 
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to use for processing the data and for interpreting results. One such famous interface we take for our 
experiments is Matlab [35] which has its own underlying programming language, and comprises several 
machine learning toolboxes, from general-purpose Statistical Toolbox to the specialised Neural-network 
Toolbox. The second major machine learning interface we explore is Weka |[23l - an open source, general 
purpose interface to run a wide variety of machine learning algorithms. 

We have already referred to the two different meanings of the term "interfaces". On the one hand, 
interfacing may mean translation mechanisms connecting ITPs with other proof automation tools l,37l 
|24l[32l[Tl; and on the other hand, it is used as a synonym for user-friendly environment. In this paper, the 
two views on the notion of interfaces meet. Our primary goal is to integrate the state-of-the-art machine 
learning technology into ITPs, in order to facilitate the automation and improve the user experience. 
However, since machine learning algorithms will need to gather statistics from the user's behaviour, and 
feed the results back to the user during the proof development process, this primary task will never be 
accomplished without machine learning becoming an integral part of the user interface. 

In this paper, we show the results of our work on interfacing interfaces - building a user-friendly 
environment that integrates a range of machine learning tools provided by Matlab and Weka into Proof 
General. In particular, we pay attention to addressing the challenges C.1-C.4. We implement the fol- 
lowing vision of interfacing between ITP and machine learning, and call the result ML4PG (machine 
learning for Proof General); available at 1125 1 . 

1. ML4PG must be able to gather statistics from interactive proofs (challenge C.3), and relate this 
statistics accurately to the three aspects of ITP-based proof development: goal-level, tactic-level, 
and proof tree level (challenge C.2). We focus on this issue in Section|2] 

2. ML4PG must automatically extract the relevant features associated with these three aspects in a 
form suitable for machine learning tools - that is, numerical vectors of the fixed length, also known 
as feature vectors (challenges C.1-C.2). We focus on this issue in Section[3j 

3. ML4PG must enable the user to choose from a range of machine learning interfaces and algorithms 
suitable for proof data-mining (challenge C.4). As we do not assume the Proof General user to have 
machine learning expertise, we want to delegate a substantial amount of pre- and post-processing 
of statistical results to ML4PG. Section [4] deals with these questions. 

4. ML4PG must automatically connect to the chosen machine learning interface, and it should collect, 
appropriately analyse and interpret the output of these algorithms (challenge C.3). Note that, in our 
work "backwards interfacing" from the machine learning tools to Proof General is less demanding 
compared to |[37ll24l[32l [T]|. We do not seek a translation of statistical results into the Coq language 
- instead, we use the statistical results to inform the user of arising proof patterns during the proof 
development. As Sections|4]and[5]show, this kind of light backwards interfacing can be efficiently 
implemented. 

5. Finally, ML4PG must give the user relevant information about the user's current proof goal in 
relation to statistically similar proof patterns detected in different libraries or even across different 
users (challenges C.1-C.4). We discuss this in Sections]?] and |5] 

Our primary goal in writing ML4PG was to provide the user with a friendly and light-weight tool 
that helps to find patterns in big proof developments. Therefore, it was important for us to embed a 
number of simple but useful options that the user with no experience in machine learning could use; see 
also Figure [T] Each section is devoted to one such aspect: Section [2| explains how various proof level 
views can be used in ML4PG. Section |3] presents the original method of automated feature extraction 
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Fi gure 1 ! Toolbar of the Proof General interface together with ML4PG extension. The ML4PG extension consists of the 
Statistics menu. It also shows ML4PG options for data-mining three levels of proof. 
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induction 1. 


Vn : nat,0 = n*0 


induction n. 


1. 


]+ + [] = 


[1.2. («::/) + - 


h[]=a::l 


simpl; trivial. 


1. = 0*0; 2. = 5n*0 


simpl; trivial. 


(« 




= a :: I 




simpl . 


= 5n * 


simpl; trivial. 


a : 


' + + [] = 


a :: I 




rewrite IHl. 


□ 


Qed. 


a : 


I = a :: I 






trivial . 






□ 








Qed. 



Table 3 1 Proof steps for Lemmas mu I t_n_0 : Vn : nat, = n * and app_ l_nil: V/ : list A,l + - 



embedded into ML4PG, we call the method a proof trace method. Section |4] explains how different 
machine learning algorithms can be accessed through ML4PG; and Section [5] describes how results 
produced by those algorithms are analysed by ML4PG. Finally, in Section [6j we conclude and discuss 
future extensions. 



2 The three levels of an interactive proof 

In this section, we consider a variety of possible approaches to proof pattern recognition in ITPs; namely, 
we consider automated proofs from the levels of goal transitions, tactic sequences, and proof trees. 
Figure [T] shows the extension of the Proof General Interface we start to develop in this section. 

We start with several running examples to illustrate the kind of statistical help we expect from 
ML4PG. We consider the library containing various lemmas about natural numbers and lists. 

Example 2 Suppose the user starts with the following two lemmas about multiplication by 0: Lemma 
mult jn_0 : V« : nat,0 = n*0 and Lemma mult_0_n: 'in : nat,0 = 0*n, see left side of Tables |3] and |4] 
for the sample proofs. They state two very similar properties, however, the proofs for them are different; 
notably, one proof involved induction, while another involved only simplification. 

Next, suppose the user switches to the library containing lists; and needs some guidance to proceed 
with the proofs for Lemma app_l_nil: \/l : list A,l + + [] =/, and Lemma app_nil_l: V/ : Z/if A, [] + +Z = 
/. The user asks ML4PG to "statistically match" these problems to previously seen proofs in the same or 



E. Komendantskaya, J. Heras & G. Grov 



5 



Goal 


Tactic 


Goal 


Tactic 


V« : nat.O = 0*n 
= 0*n 

□ 


intro . 

simpl; trivial. 
Qed. 


V/:/wf A, []++/ = / 

[] + +/ = / 
□ 


intro 1 

simpl ; trivial . 

Qed. 



Table 4: Proof steps for Lemmas mult_0_n : Vn : nat,0 = 0*n and app_ni V/ : list A, [] + +/ = /. 

in a different library. We then want ML4PG to tell the user that there are two similar lemmas in the Nat 
library, - namely Lemma mult_n_0:V« : nat,0 = n*0 and Lemma mult_Ojti: \/n : nat,0 = 0*n. Then 
the user will adapt these old proofs to complete new proofs as given on the right side of Tables [3] and |4] 
Note that this guidance will go further than just identifying proofs over the same data type, identifying 
same tactic combinations, same functions/operations or similar lemma shapes. Such guidance would be 
based on statistical correlation of several proof features. 

As can be seen from these examples, the user may be interested in data-mining the proofs based on 
either 

★ transitions between subgoal-shapes (in which case Lemmas app_ljtiil and mult_n_0 of Table [3] 
have common patterns), or 

statistics of tactic combination (in which case Lemmas app Jiil_l and mult_0_n of Table |4] should 
be identified), or 

★ a more general understanding of lemma content (in which case all four are similar). 

Therefore, we distinguish three levels at which pattern-recognition in ITP proofs can be approached, 
see also ll2ll : 

1. Goal-pattern recognition. Sequences of subgoals may show an apparent pattern in the structure of 
the formulas. This type of feature abstraction has been used for learning the inputs for automatic 
provers EH [HI - which has later been extended to interactive proofs B3l . 

Example 5 The left-most columns of Tables [3] and |4] should be used to gather such information 
about goal-patterns. 

2. Tactic-pattern recognition. Sequences of tactics applied at every level of the proof bear some 
apparent patterns, as well. There is always a finite number of tactics for any given proof, and 
therefore, they may serve well as features for statistical learning. Previous work on learning proof 
strategies lf27l [T6l has taken this approach. It is important to note that there may be proofs in 
which the goal structures do not bear any evident pattern; however, the sequence of applied tactics 
does. Also, as an additional complication, there is a variety of tactic combinations that may lead 
to a successful proof for one goal; and vice versa, different goals may yield the same sequences 
of tactics in successful proofs. Moreover, tactics often have complex configurations, which can be 
hidden or given as arguments (e.g. rules to apply or instantiations of variables). 



Example 6 The right-most columns of Tables |3] and |4] provide information about such tactic- 
patterns. 
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forall 1: list A, 1 ++ [] = 1 

.induction 1 



□ ++ [] = [] (a: A) (1: list A) (IHl: 1 ++ □ = 1) |- (a::l) ++ [] = a::l 

^simpl; trivial ^simpl 

□ (a: A) (1: list A) (IHl: !++[]= 1) |- a::l ++ [] = a::l 

^rewrite IHl 

(a: A) (1: list A) (IHl: 1 ++ [] = 1) |- a::l = a::l 

^trivial 

□ 



Figure 7: Proof tree for app_l-nil. 

The disadvantage of tactic-pattern recognition is that any knowledge of when and why a tactic is 
applied, as well as its result is lost (except with respect to other tactic applications). 

3. Proof tree pattern recognition. Finally, there is the level of a proof tree - that shows relations 
between different proof branches and subgoals and gives a better view of the overall proof flow; 
this approach was tested in OTl using multi-layer neural networks and kernels. 



Example 8 Figure [7] shows the proof tree for app_l Jiil. An advantage of the proof tree as 
opposed to goal or tactic sequence, is that it distinguishes between different proof branches. 

Our second running example is based on the bigop library of SSReflect. This library is devoted to 

n 

generic indexed big operations, like ^ /(/) or |J /(/). 

!=o iei 

Example 9 We take three lemmas about number series: 

n 2n n 

V«,2(^/) = + 1); Vn, ^ i = n^\ V«,]^/ = «! 

!=0 i=Q\odd i 1 



The proofs of these three lemmas, both at the level of goals and tactics, are given in Table 10 Intuitively, 
they show certain similarities and dissimilarities, both at the level of goals and tactics. In the next 
sections, we will test how ML4PG analyses such cases. 

Next, we study how these general considerations about the levels of proof patterns are used in 
ML4PG to extract features used in statistical data-mining. 



3 Feature extraction in ML4PG: the Proof Trace method 

In this section, we explain algorithms used by ML4PG to gather proof statistics at the levels of goals, 
tactics, and proof trees. 

The discovery of statistically significant features in data is a research area of its own in machine 
learning, known as feature extraction, see [8]. Irrespective of the particular feature-extraction algorithm 
used, most pattern-recognition tools will require that the number of selected features is limited and fixed. 
We design our own method of proof feature extraction. The major challenge is to respect the above 
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Goal 


Tactic 


2( I ') = 

;=0 


n{n+l) 




elim : n. 




2( I i) = 
1=0 


Ox 1 




by rewrite mulOn big_natl mulnO. 


Vn,2(i; ! 
1=0 


)=„(„+!) =^ 2("l 
1=0 


= (n+l)(n + 2) 


move => n IH . 


«+l 

2( I = 

1=0 


(n+l)(n + 2) 




by rewrite big_nat_recr mulnDr IH -mulnDl addn2 
mulnC . 


□ 






Qed. 



Goal 


Tactic 


2« , 
i=0\odd i 


elim : n. 


2x0 
i=0\odd i 


rewrite expOn // /index_iota subnO bigl_seq //. 


yi e N, odd i && (/ e iota (2 x 0)) =^ i = 


by move => i; move/andP => [_ H2] ; move : H2; 
rewrite mulnO in_nil. 


In 2(11+1) 
i=<S\odd i i=0\odd i 


move => n IH. 


2(«+l) 
i—0\odd i 


by rewrite big_mkcond - [n. +1] addnl mulnDr mulni 
addn2 !big_nat_recr IH odd2ii odd2iil //= addnO 
nlsquare n2square. 


□ 


Qed. 



Goal 


Tactic 


11 

Y{i = n\ 
1 




nl'=o! 

V«,n; = n! =^ "n '■=(«+!)! 
1 1 


elim : n. 

by rewrite big_nil. 
move => n IH. 


11+1 

n <■=(«+!)! 

1 




□ 


by rewrite factS big_addl -IH big_addl big_nat_recr mulnC. 
Qed. 



n 2n 

Table 10: interactive Proofs for Lemma sum-first-u: 2{Y, i) = nin+ I): Lemma sum-first-U-odd: Y, i = n ; and 

i=0 i^O\odd i 

n 

Lemma fact_prod: Y\i = n\. 

1 
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tactics 


N tactics 


arg type 


tactic arg is hypothesis ? 


top symbol 


n subgoals 


gl 














g2 














g3 














g4 














gS 















Table 12: Goal-level feature extraction table. Parameters inside the double lines are the extracted features. Notation gl-g5 
is used to denote five consecutive subgoals in the derivation. Columns are the properties of subgoals the feature extraction 
method ofMLAPG will track: names of applied tactics, their number, arguments, link of the proof step to a hypothesis (Hyp), 
inductive hypothesis (IH) or a library lemma (EL); top symbol of the current goal, and the number of the generated subgoals. 





tactics 


A' tactics 


arg type 


tactic arg is hypothesis? 


top symbol 


n subgoals 


gl 


induction 


1 


nat 


no 


forall 


2 


g2 


simpl; trivial 


2 


none 


no 


equal 





g3 


simpl; trivial 


2 


none 


no 


equal 





g4 














gS 


















tactics 


A' tactics 


arg type 


tactic arg is hypothesis? 


top symbol 


n subgoals 






gl 


induction 


1 


list 


no 


forall 


2 






g2 


simpl; trivial 


2 


none 


no 


equal 









g3 


simpl 


1 


none 


no 


equal 


1 






g4 


rewrite 


1 


Prop 


IH 


equal 


1 






g5 


trivial 


1 


none 


no 


equal 








Table 13: Comparison of the goal-level feature tables for mult_n_0 and app_l_nil. Intuitively, 18 out of 30 features in 
these tables show correlation; we highlight them in bold. 

restriction while allowing to data-mine potentially unlimited variety of different higher-order formulas 
and proofs. 

We first focus on the level of goals. ML4PG must choose the relevant features for statistical analysis. 
At this level, we could consider general goal properties such as "goal shape" (e.g. "associative-shape" 
or "commutative-shape"), or properties like "the goal embeds a hypothesis", "the goal is embedded into 
a hypothesis". However, gathering such features uniformly across any set of proofs would be hard, 
especially when working with richer theories and dependent types. 

Example 11 Consider the proof for app_ljtiil given in Table |4] One could say that the valuable infor- 
mation about the shape of the (sub)goal 4 is that it embeds the inductive hypothesis, as this fact is later 
used in the proof. However, for more complex examples, deciding such embeddings unambiguously 
during feature-extraction may be non-trivial, see [6 |. Finally, detecting a fixed number of properties like 
e.g. "commutativity" may apply to one type of proof libraries, e.g. natural numbers, but not to others, 
e.g. lists, in which case uniform comparison of proof patterns across libraries becomes hard. 

This is why, we developed a method of implicit tracking of proof properties, called the proof trace 
method; its early variant was used in IIBTI . The idea is as follows. When direct feature-extraction of the 
goal shapes is infeasible, we still can infer some properties of the goals when gathering statistics of how 
the user treats the goals. In other words, we let the term-structure show itself through the proof steps 
it induces. We deliberately do not pre-define the types of proof patterns that ML4PG must recognise, 
or even what a correlation of proof features is. Instead, we want statistical machine learning tools to 
suggest the user what these might be. An advantage of such proof feature extraction method is that it 
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applies uniformly to any Coq library, and does not require any adaptation when ML4PG changes the 
libraries. 

Another important feature ML4PG must be sensitive to, is the long-lasting effect of the first proof 
step on several consecutive proof steps. The dependency between subgoals very often extends much 
farther than from one proof step to its immediate successor. Thus, we want to capture two dimensions of 
goal transformation in a proof: 

1. various properties of a single (sub)-goal; 

2. transformations of each such property throughout several proof steps. 



This is why, we design two dimensional arrays as shown in Table 12 to allow for statistical data- 
mining of the two dimensions in their relation. 



Example 14 Consider Table 13 where the correlation between iiiult_n_0 and app_l_iiil at the goal 
level is shown. If we consider the table associated with app_l_nil, the fact of using the tactics 
induction and (simpl ; trivial) may not be significant, as this combination can be applied to a 
variety of goals. It may be insignificant that the top symbol of the goal was the quantifier V. However, 
the table related to this lemma allows us to characterise the goal V 1 : list A, []++l=l by the 
30 features (entries) of the table. Correlations of values of these features will be more likely to show 
significant proof patterns, if such exist. 



Example 15 As can be seen in Table 16 there is a strong correlation between the features associated 
with the first step in the proofs of sum_f irst_n, sum_f irst_n_odd and fact_prod. However, this 
strong correlation only remains between sum_f irst_n and f act_prod when successive proof steps are 
considered. This illustrates the fact that we cannot focus just on the first goal of a proof, but we have to 
study its proof trace to obtain relevant patterns. 

Note that this method gathers statistics both dynamically (considering several proof steps instead 
of just one) and relationally (the information about the goals is related to the applied tactics, and the 
features for the five tactics are data-mined together, so their correlation plays a role in the proof pattern 
recognition). Another advantage of the method is its focus on user interaction: ML4PG learns proof 
patterns specific to the user's proof style as given in the chosen library of proofs. 

On the tactic-level, ML4PG focuses on features associated with each tactic applied in a proof script. 



Such properties are gathered using Table 17 It is worth noting that the structure of the goal-level Ta- 
ble 12 can be reused in all the systems based on the application of a sequence of tactics (e.g. Coq, 
Isabelle/HOL, Matita, etc.); the only difference would be the values which populate the table. On the 
contrary, the structure of the tactic-level table depends on the concrete system, since each ITP has its 
concrete set of tactics. 

The case of Coq is special, since we can find two proof styles: plain Coq and SS Reflect. Although 
SSRefiect is an extension of Coq, this package implements a set of proof tactics designed to support the 
extensive use of small-scale reflection in formal proofs IT91 . In addition, the behaviour of some Coq 
tactics has been modified (for instance, the rewrite tactic); so, the SSRefiect imposes a distinct proof 
style. ML4PG works with both styles of Coq proofs. In the case of plain Coq, the rows of the tactic table 
represent the main Coq tactics (from almost 100 Coq tactics we only study the 10 most popular). The set 
of SSRefiect tactics consists of less than 10 tactics, so we have included all of them. 



Example 18 Consider the fragments of tactic-level tables associated with the Lemma appjiil.l and 
mult_0_n, in Table 19 The extracted features show close correlation, as expected. 
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tactics 


A' tactics 


arg type 


arg is hyp ? 


top symbol 


n subgoals 


gl 


elim 


1 


nat 


Hyp 


equal 


2 


g2 


rewrite 


1 


3 X Prop 


3xEL 


equal 





g3 


move => 


1 


nat,Prop 


no 


forall 


1 


g4 


rewrite 


1 


6 X Prop 


2x EL, IH, 3x EL 


equal 





gS 

















tactics 


N tactics 


arg type 


arg is hyp ? 


top symbol 


n subgoals 


gl 


elim 


1 


nat 


Hyp 


equal 


2 


g2 


rewrite 


1 


4 X Prop 


4xEL 


equal 


1 


g3 


move=>,move/ 


4 


nat, 5 X Prop 


3x EL 


forall 







move:, rewrite 














g4 


move => 


1 


nat,Prop 


no 


forall 


1 


gS 


rewrite 


1 


1 1 X Prop 


6x EL, IH, 5xEL 


equal 








tactics 


A' tactics 


arg type 


arg is hyp ? 


top symbol 
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Table 16: Goal -level feature extraction tables for sum_first_n and sum_first_n_odd and fact_prod. In the tables of 
sum_first_n_odd and fact_prod, we highlight in bold the correlation with the features of sum_first_n. There is a strong 
correlation between sum_first_nand fact_prod(28 out of 30 features); on the contrary, there is a weak correlation between 
sum_f irst_n and sum_first_n_odd (11 out of 30 features). 
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top symbol 
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move => 












move : 












move/ 












rewrite 












case 












elim 













Table 17: Fragment of tactic-level feature extraction table for SSReflect. The rows are the tactics implemented in the given 
ITP. On the contrary to the rows, the columns of this table encode the same properties for both plain Coq proofs and proofs 
in SSReflect: the type of the first argument, a number that denotes the types of the remaining tactic arguments; identification 
whether the arguments of the tactic are hypotheses (Hyp), inductive hypotheses (IH), external lemmas (EL) or none of them. 
Finally, ML4PG populates the last two columns with the list of top symbols of the goals where the tactic has been applied; and 
the number of times the tactic was applied. 
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Table 19: Tactic-level feature extraction tables for mult-O-nand app-ni l-l in plain Coq, showing close correlation in bold. 
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Table 21: Tactic-level feature extraction table for sum_first_n, sum_first_n_odd and fact_prod in SSReflect, showing 
correlation in bold: 15 out of 30 between sum_first_n and sum_first_n_odd; and 28 out of 30 between sum_first_n and 
fact_prod. 



Example 20 Fragments of tactic tables for Lemmas sum_f irst_n, sum_f irst_n_odd and f act_prod 



are given in Table 21 There is a strong correlation between sum_f irst_n and f act_prod at this level. 



Finally, ML4PG can extract the tree-level features, see Table 22 Currently, it considers the proof 
flow using up to the depth 5 of the proof tree. 

Example 23 The tables for Lemmas sum_f irst_n, sum_f irst_n_odd and f act_prod at the proof 
tree level are given in Table [24| 

The feature extraction procedures explained in this section run in the background of ML4PG. Some of 
the features are obtained just by inspecting the names and numbers of the applied tactics. In other cases, 
ML4PG internally invokes Coq to obtain the feature, for instance, when recording types of arguments. 
Thus, statistics related to the three proof levels is automatically gathered during the proof development. 

Machine learning algorithms expect numerical feature vectors as inputs; therefore, ML4PG converts 
the features into numbers. As we explained in [31], the concrete function that ML4PG uses for this 
purpose may vary, but the numeric conversion must be consistent. Dynamic calculation of the function 
that converts table features into numbers is implemented in ML4PG; for the lack of space, we do not 
discuss it here, full details are available in 1251 . Once the feature vectors are collected, ML4PG can 
data-mine the proofs using different machine learning algorithms. 



4 Interactive proof-clustering in ML4PG 

ML4PG is designed to prove the concept: it is possible to interface higher-order proofs with machine 
learning engines, and do it interactively during the proof process. Interaction with several machine learn- 
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td4 




















td5 





















Table 22: Proof tree level feature extraction table for SSReflect. The rows, marked as tdl-td5 represent tree levels. The 
values of columns depend on the tactics, and for each tactic, a different parameter is tracked: for tactics caseand elim- it is 
the type of argument; for tactic rewrite- whether the rewriting rule is a hypothesis, inductive hypothesis, or external lemma. 
In the • column ML4PG stores the branching factor of the proof tree. It encodes this information globally using the following 
convention. The first number represents the tree depth level, so, if we are at level 1 the first number will be 1, at level 2 the 
first number will be 2 and so on. Then for each one of the branches of the level we store the number of its subbranches. The □ 
column indicates the number of proof branches closed at the given level. 
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Table 24: Fragment of proof tree level feature table for sum_first_n, sum_first_n_odd and fact_prod. We highlight in 
bold the corresponding features between sum_ firs t_n and fact_prod (13 out of 14 features). 



ing engines and algorithms is in the core of this process. This differs from the experiments performed 
in the literature, see 1161 ISTl l34l |45 1, where the data-mining of proofs is performed either statically or 
through ad-hoc implementation of statistical algorithms. In this section, we explain how ML4PG enables 
the user interaction with a range of machine learning engines and algorithms; and give some technical 
details of the ML4PG implementation. 

The ML4PG user may or may not be an expert in machine learning. Either way, ML4PG must offer 
him a number of simple but useful options to configure machine learning tools while staying within the 
Proof General environment. Therefore, ML4PG takes the burden of connecting to the machine learning 
algorithms, see also Figures [T] and 25 



The first choice the user makes concerns the proof level: proofs can be data-mined at the level of 
goals, tactics or proof trees, as explained in the previous sections. It is worth mentioning here that there 
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Fi gure 25: ML4PG option to select the machine learning engine. 
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are several choices of how to run this feature extraction. One option would be to extract features on 
demand - that is, once the user chooses the proof level, ML4PG could re-run Coq again to complete the 
feature extraction. The disadvantage of this is that the proof engine will have to be re-run every time one 
uses ML4PG for data- mining. We made a different choice: ML4PG extracts features in the background 
during the interactive proofs. It does the extraction at all three proof levels whenever the proof library is 
compiled. Then, the choice of proof level in the menu (cf. Figure [T} just indicates which data set will 
be sent to the machine learning algorithms. The advantage is that the time taken by data mining does 
not include the proof engine run. Our experiments show that the time involved in the feature extraction 
during the normal Coq compilation is literally unnoticeable, and does not significantly slow down the 
proof development. 

Now ML4PG is ready to communicate with machine learning interfaces. ML4PG is built to be 
modular - that is, when the feature extraction of Section [3] is completed within the emacs environment, 
the data is gathered in the format of hash tables. The first elements of these tables are the names of 
the lemmas, and the second elements are the feature vectors encoded as lists of numbers (let us note 
that emacs is a Lisp environment; therefore, it is sensible to use lists to represent the feature vectors). 
However, every machine learning engine has its concrete format to represent feature vectors; therefore, it 
is necessary to define translators to adapt ML4PG's internal encoding of feature vectors to the concrete 
representation of the machine learning engine. We have defined translators for two different, but equally 
popular, machine learning interfaces - Matlab and Weka. 



Example 26 Figure 25 shows the ML4PG graphical interface accommodating such interfacing. In re- 
ality, this only changes the translator which is applied to the feature vectors obtained from ML4PG's 
feature extraction mechanism, and the destination for sending the resulting feature vectors. 

Notice the similarity with implementation of the proof level choice: again, once the features are 
extracted, the ML4PG engine is flexible to use them for all sorts of data mining tasks and machine 
learning interfaces. ML4PG transforms the feature vectors to a comma separated values (csv) file in the 
case of Matlab; and, to arjf files in the case of Weka. In principle, extending the list of machine learning 
engines does not require any further modifications to the feature extraction algorithm, but just defining 
new translators. 

Once the feature vectors are in a suitable format, ML4PG can invoke the machine learning engine. 
The ML4PG mechanism connecting to machine learning interfaces is similar to the native mechanism of 
Proof General used to connect to ITPs. Namely, there is a synchronous communication between ML4PG 
and the machine learning interfaces, which run in the background waiting for ML4PG calls. 

The second configuration option ML4PG offers is the choice of the particular pattern-recognition 
algorithm available from the chosen machine learning interface. Again, this choice is made within the 



proof environment of Proof General, see Figure 28 There are several machine learning algorithms 



available in Matlab and Weka. We connected ML4PG only to clustering algorithms fSl - a family of 
unsupervised learning methods. Unsupervised learning is chosen when no user guidance or class tags 
are given to the algorithm in advance. 



Example 27 One could in principle envisage supervised machine learning applications in proof pattern 
recognition, where the user labels every proof using some finite tags, such as "fundamental lemma", 
"auxiliary lemma", "proof experiment". And, on the basis of such labels and some number of training 
examples, the machine learning algorithm would be able to predict labels for any new proof. 

Here, we do not assume existence of such labels. However, our modular approach to interfacing with 
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Fi gure 28; ML4PG option to select the clustering algorithm when working with Weka. 
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Figure 29 1 ML4PG option to select the clustering libraries and to export libraries. 



Proof General implies that, if the labels are available, interfacing with supervised algorithms will not be 
hard for ML4PG. In fact, we envisage the feature extraction method to remain the same. 

In case of Matlab, the algorithms included in ML4PG are two of the most popular methods for 
clustering: K-means and Gaussian BTl . If the user selects Weka as a machine learning engine; then, he 



can select among K-means, FarthestFirst and simple Expectation Maximisation, see Figure 28 

To improve the accuracy of the clustering algorithms, a technique called Principal Components Anal- 
ysis (PCA) [28] is applied. This functionality reduces the size of feature vectors but without much loss 
of information. The application of techniques like PCA, known in general as dimensionality reduction 
procedures HTl . is recommended when dealing with feature vectors whose size is higher than 15 - as in 
our case. 



Finally, the user can chose proof libraries that he wants to access using ML4PG (see Figure 29 1. 
Before using them, those libraries must be exported with the mechanism provided by ML4PG. ML4PG 
extends the compilation procedure that Coq uses for imported libraries with the feature extraction al- 
gorithm described above. Such a mechanism checks that all the proofs of the library are finished, and 
generates a file which contains the list of the lemmas of the library, and three files encoding respectively 
the feature vectors at the goal level, tactic level and proof tree level. Subsequently, when the user chooses 
a Mbrary, ML4PG transforms the files to the internal encoding of feature vectors (implemented by Lisp 
lists) and attaches those vectors to those obtained in the current development. 

By default, ML4PG clusters the current library, but the user can add more libraries to perform clus- 
tering. The reason for not using all the available libraries is twofold. First of all, it is a question of 
performance, since the time needed to obtain clusters increases with the amount of libraries. The sec- 
ond reason is usability, because if ML4PG uses all the available libraries for clustering, it can obtain 
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induction 1. 
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simpl . 

rewrite IHl. 

trivial. 

Qed. 



Lemma mult_n_0 : 
induction n.[] 
simpl; trivial, 
simpl; trivial. 
Qed. 



Lemma mult_0_n 
intro. 
simpl . 



trivial . 
Qed. 



Lemma plus_n_0 : 
induction n. 

simpl; trivial, 
simpl; trivial, 
rewrite <- IHn . 
trivial. 
Qed. 

:**- peano.v 
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forall n : nat , ■■ 



forall n:nat, n 
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* display2* 
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Fi gure 3 1 ! Clusters for the library Initial. The Proof General window has been split into two windows positioned side by 
side: the left one keeps the current proof script, and the right one shows the clusters and their frequencies. If the user clicks 
on the name of a theorem showed in the right screen, such a window is split horizontally and a brief description of the selected 
theorem is shown. 

proof patterns from lemmas which belong to libraries unknown to the user, and this may or may not be 
convenient. 

We now return to the examples introduced in Section [2]to illustrate how proof patterns are shown to 
the ML4PG user. 



Example 30 We created a small library (70 lemmas) to help us with the initial tests of ML4PG: it 
contains some basic lemmas about natural numbers and lists, as well as our running examples of Tables 
[3] and |4] We also included efficient and inefficient proofs, and cases when similar lemmas were proven 
using different strategies, and different lemmas were proven using the same proof strategy. In the rest 



of the paper, we will call the library Initial. Figure 31 shows the result of running ML4PG on this 
library, with the following settings: 

• statistics was taken using goal-level feature extraction; 

• machine learning interface: Matlab; 

• machine learning algorithm: k-means clustering. 

ML4PG shows that all lemmas of Tables |3] and |4] belong to the same group of proofs. It agrees with 
one possible interpretation of the content of these lemmas, see in Section [2] But there are other 
lemmas in that cluster; in particular, this cluster gathers "fundamental" lemmas about various operations 
on natural numbers involving and operations on lists involving nil. 

The example above shows one mode of working with ML4PG: that is, when a hbrary is clustered 
irrespective of the current proof goal. However, it may be useful to use this technology to aid the interac- 
tive proof development. In which case, we can cluster libraries relative to a few initial proof steps for the 
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Figure 33: On the left side the development about the Lemma sum_first_n. On the top part of the right side, the current 
goal. On the bottom part of the right side, several suggestions provided by ML4PG. If the user clicks on the name of one of the 
suggested lemmas, a brief description about it is shown. 



current proof goal. The next example illustrates this. Note also that ML4PG graphical interface offers 



two menu buttons for these two options - the two right-most buttons of Figure 3 1 



Example 32 On the left side of Figure 33 an incomplete development of Lemma sum_f irst_n is 



shown. Using the bigop and binomial libraries of SSReflect (around 200 lemmas), ML4PG can obtain 



proof patterns similar to the lemma that we are proving, see right side of Figure 33 The ML4PG settings 
in this case were: 

• statistics was taken using goal-level feature extraction; 

• machine learning interface: Weka; 

• machine learning algorithm: Expectation Maximisation. 

Among the suggestions provided by ML4PG, we found the Lemma f act_prod. Note that, as we 
have seen in Section [3} there is a high correlation between their feature vectors. Other lemmas ML4PG 
discovered are related to series of natural numbers (including properties about big sums and big prod- 
ucts). Lemmas like sum_f irst_n_odd, where there is a restriction on the elements of the series, belong 
to a different cluster since the correlation with lemmas like sum_f irst_n and f act_prod is low. 

We have shown flexibility, modularity and interactivity of ML4PG in interfacing with machine learn- 
ing environments. These features come for free with the light version of "backwards interfacing" that 
ML4PG implements: that is, it does not translate the outputs of the clustering algorithms back into the 
Coq language. Its only form of backwards interfacing is conversion of clustered feature vectors back into 



lemma names - the output shown in Figures 31 and 33 



Generally, interfacing the ITPs with external tools (e.g. ATPs) is a challenging task, see ll37l l24l l32l 
[D. A special concern is the translation of the output produced by the external tools into the ITP. This is 
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due to the fact that unsound translation can introduce inconsistencies in the system. In case of interfacing 
with machine learning, the external tool is even more alien to ITP's syntax than ATPs. Light backwards 
interfacing implemented in ML4PG may well be the optimal solution to the problem. 

5 Handling proof statistics in ML4PG 

The previous sections highlighted two features of ML4PG - light backwards interfacing and interactive 
handling of machine learning interfaces. To handle these features gracefully, ML4PG must offer the 
user a convenient environment for processing and analysing the results obtained by machine learning 
algorithms. Two parameters play an important role in analysis of clusters and their accuracy: these are 
the number of clusters and their frequency. 

Clustering techniques divide data into n groups of similar objects (called clusters), where the value of 
n is a "learning" parameter provided by the user together with other inputs to the clustering algorithms. 
Increasing the value of n means that the algorithm will try to separate objects into more classes, and, as a 
consequence, each cluster will contain examples with higher correlation. The frequencies of clusters can 
serve for analysis of their reliability. Results of one run of a clustering algorithm may differ from another, 
even on the same data set. This is due to the fact that clustering algorithms randomly choose examples 
to start from, and then, form clusters relative to those examples. However, it may happen that certain 
clusters are found repeatedly - and frequently - in different runs; then, we can use these frequencies to 
determine the reliable clusters. 

ML4PG's tools handling statistical results include Matlab and Weka programs that post-process out- 
puts of the clustering algorithms. For each clustering algorithm the user invokes, ML4PG generates one 
corresponding program handling the output statistics. These various programs always have three argu- 
ments: a file and two natural numbers representing the number of clusters and frequency threshold. We 
explain these settings in this section. 

The file stores the feature vectors using a format accepted by the machine learning engine. We use 
CSV files for Matlab and arjf files for Weka. These files are automatically generated by ML4PG during 
the feature extraction process described in Section [3] However, ML4PG users can vary the other two 
parameters. 

When ML4PG sends proof features to machine learning engines, the second parameter indicates the 
number of clusters; given by a positive integer n. Various numbers of clusters can be useful for interactive 
proof data-mining: this may depend on the size of the data set, and on existing similarities between the 
proofs. We want ML4PG to accommodate such choices. In general, small values of n are useful when 
searching for general proof patterns which can later be refined by increasing the value of n. However, 
extreme values are to be avoided: small values of n can produce meaningless proof clusters for big proof 
libraries; whereas trivial clusters with just one proof may be found for big values of n. Very often in 
machine learning, the optimal number of clusters is determined experimentally, but we cannot afford this 
in ML4PG setting, as we cannot assume that the users will be willing to invest time and efforts into such 
experiments. 

In the machine learning literature, there exists a number of heuristics to determine this optimal num- 
ber of clusters, [47|. We used them as an inspiration to formulate our own algorithm for ML4PG, tailored 
to the interactive proofs. It takes into consideration the size of the proof library and an auxiliary param- 
eter we introduce here - called granularity. This parameter is used to calculate the optimal number of 
proof clusters, using the formulas of Table [34| As a result, the user does not provide the value of n, but 
just decides on granularity in ML4PG menu, by selecting a value between 1 and 5, where 1 stands for a 
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Granularity 


Number of clusters 


1 


Lz/ioj 


2 


L//9J 


3 


L//8J 


4 




5 


L//6J 



Frequency parameter 


Frequency Threshold 


1 


5% 


2 


15% 


3 


30% 



Table 34: ML4PG formulas computing clustering parameters. Left: the formula computing the number of clusters given 
the granularity value, where / is the number of lemmas in the library. Right: the formula computing frequency thresholds given 
a frequency parameter. 
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Fi gure 35; ML4PG option to determine the granularity. 
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^3 
4 

5 



low granularity (producing big and general clusters) and 5 stands for a high granularity (producing small 



and precise clusters). See Figure 35 



Example 36 Consider Example 30 There, the default granularity was 3, and the cluster contained all 
lemmas from Tables [3] and |4] Increasing the granularity, ML4PG discovers only Lemmas app_l_nil 
and mult_n_0 (see -k from Section |2]) as well as similar proofs for plus_n_0 and iiiinus_n_0. All of 
them use induction, and prove properties concerning initial objects. Note that in Section[2]we conjectured 
this separation of examples of Tables [3] and |4] into two clusters as a desirable feature. 



Example 37 In Example |32| ML4PG used the default granularity value of 3, to obtain ten suggestions 
related to the Lemma stini_f irst_n. If the ML4PG user increases such granularity value to 5, he obtains 



only one suggestion, see Figure 38 the Lemma fact_prod. Inspecting the proof of Lemma f act_prod 
can give an insight into how to finish the proof for suni_f irst_n. We notice that we can apply the 
Lemma big_nat_recr to our current goal and, subsequently use the inductive hypothesis. The rest of 
the proof is based on rewriting rules of natural numbers. 

As implied by the above examples, the configuration of the granularity parameter can be approached 
in two different ways: top-down and bottom-up. The top-down approach suggests first using a small 
value for the granularity to obtain a general proof pattern, and then refine that pattern increasing the 
granularity value. On the contrary, in the bottom-up approach a high value for the granularity is used to 
see what are the most similar lemmas and, then, decrease the granularity value to see more general - and 
potentially less trivial - patterns. 

Finally, the third parameter ML4PG uses to analyse clustering outputs is the. frequency threshold. For 
this purpose, ML4PG actually uses double criteria: the proximity and frequency of the cluster. Clustering 
algorithm output contains not only clusters but also a proximity value - a measure of how close each 
object in one cluster is with respect to objects in other clusters. This measure ranges from +1, indicating 
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Require Import ssreflect ssrfun ssrbool eqt* 
Require Import finfun tuple ssralg matrix 

Set Implicit Arguments. 
Unset Strict Implicit. 
Import Prenex Implicits. 

Section series . 



Lemma powersuml (n : nat) : 
2 * (\sum_(0<-i<n.+l) i) - r * (i 

Proof . 

elim : n , 

by rewrite mulOn big_nat1 raulnO. 

move ^> n IH, 

D 



1 subgoals, subgoal 1 (ID 33) 



n : nat 

IH : 2 * (\sum_(0 ■ 
2 * (\sum_(0 <- i 
(dependent evars:) 



< n.+l ) i) = n 
+2) i) - n.+l 



U: %'X,- *Eo al5* 
Similarities : 



(Coq Goals)- 



This lemma is similar to the lemma called fact prod 

a 



V Top L16 (Coq 5c ript(i-U: 



All L4 (Fundamental)- 



Fi gure 38; Suggestion about the Lemma sum_first_n using as granularity the value 5. 



points that are very distant from other clusters, through 0, indicating points that are not distinctly in one 
cluster or another, to — 1 , indicating points that are probably assigned to the wrong cluster. We have 
fixed 0.5 as an accuracy threshold, and all the clusters whose measure is under such value are ignored 
by ML4PG. This criterion is fixed, and the user interface does not give access to it. However, the second 
criterion, the frequency parameter, is customizable within the interface. 

Our experience shows that analysis of frequencies may give two opposite effects. 

* On the one hand, high frequencies suggest that the proofs found in clusters have a high correlation, 
and that is a desirable property. 

** On the other hand, proofs with too high correlation may be too trivial for providing interesting 
proof hints. Therefore, it is sometimes useful to look for proof clusters with lower frequencies - 
as they may potentially contain those non-trivial analogies. 

Example 39 Illustrating this, in our running example, the four proofs from Tables |3] and |4] were initially 
found only in 6% of runs (low frequency), see Figure |3TJ whereas there were other clusters with high 
frequencies that contained trivially similar lemmas. 

To gather sufficient statistical data from proofs, ML4PG runs the chosen clustering algorithm 200 
times at every call of clustering, and collects the frequencies of each cluster. After discarding those with 
low proximity, it calculates the frequency of the significant patterns. Once frequencies are calculated, 
ML4PG applies the following methodology. As item * suggests, one purpose of the frequencies is 
to serve as thresholds: if the number of times the cluster occurs falls below the pre-set threshold, the 
corresponding proofs will not be displayed to the user. On the other hand, as item ** suggests, the 
acceptable frequency threshold values may differ from proof to proof, and may depend on the purpose 
of proof-mining. For this, ML4PG allows the user to vary the threshold values. At the moment, we 
implemented three choices: frequency parameters 1 — 3 as shown in Table 34 This particular range 
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Lemma M43 
Lemma mulnS 
Lemma M28 
Lemma M29 
Lemma M35 



Cluster 3 with frequency 489S 
Lemma M3 3 
Lemma M3 4 

fluster 4 with frequency 47% 
Lemma a pp 1 nil 
Lemma mult n 
Lemma plus n 
Lemma minus 

Cluster 5 with frequency 47% 
Lemma aux7 
Lemma M32 
Lemma auxH 
Lemma M38 
Lemma M42 
Lemma M30 
Lemma M33 
Lemma M34 
Lemma M23 

Cluster 6 with frequency 32% 
\t**- *display* 6% L25 (Fundamentj 
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We have found the fallowing clusters: 



Cluster 1 with frequency 15% 
.emma Ml corrected 
emma M3 3 
.emma M3 4 

Cluster 2 with frequency 7% 
.emma a pp nil 1 
.emma a pp 1 nil 

emma mult Q n 

emma plus n 
.emma minus n 
.emma mult n 
.emma §lus n 
.emma L32Mlessintro 

emma W31 
.emma M36 
.emma M27 

Cluster 3 with frequency 5% 

mma a pp nil 1 
.emma a pp nil 1 shorter' 
.emma a pp nil 12 
.emma mult n 
.emma Ml 5 c 
.emma minus 
emma plus n 

*display* Top LI 5 
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We have found the fallowing clusters: 

Cluster 1 with frequency 58% 
Lemma L31M 

Lemma L32Mintros 

Cluster 2 with frequency 50% 

Lemma M24 

Lemma M25 

Lemma M26 

Lemma M37 

Lemma M39 

Lemma M40 

Lemma M41 

Lemma M43 

Lemma mulnS 

Lemma M28 

Lemma M29 

Lemma M35 

Cluster 3 with frequency 52% 
Lemma a pp nil 1 
Lemma mult n 
Lemma plus n [] 

Cluster 4 with frequency 43% 
Lemma M3 3 
Lemma M3 4 

J:**-. *display* Top L24 (Fundameni 



Fi gure 41 ! Effects of increasing granularity of clustering on the levels of goals and tactics for the library Initial. Left: 

compare to Figure \31\ smaller clusters are formed when 5 is chosen as the granularity value at the goal level; frequencies 
increase. Centre: Tactic-level clusters formed with granularity parameter 2; the frequencies are relatively low. Right: Tactic- 
level clusters formed with granularity parameter 5, the clusters become smaller, and the frequencies increase. 



of thresholds comes from our experience with several Coq and SSReflect libraries. However, in line 
with our general modular approach to ML4PG design, we assume a wider range can be implemented, if 
desired. Our current choice is to keep the ML4PG interface simple and minimalistic. 

Our next example shows an interesting interplay between the effects of varying granularity, fre- 
quency, and proof level in the process of proof data-mining. 

Example 40 In Example [30l ML4PG shows the clusters for our four running examples from the library 



Initial, when using the default frequency value of 1. As Example 39 showed, when increasing the 
frequencies parameter, such a cluster would fall below the threshold (compare with Figure |3T] where the 
frequency of this cluster was 6%). At the same time, when increasing the granularity parameter to 5, our 
four proofs will be split into two smaller clusters, each having higher frequencies. Notably, inductive 
proofs (see * and Table [3] from Section [2] and Example 36 1 are separated from those by simplification 



(see -k-k and Table|4]). The cluster containing only lemmas from Table|3]has a frequency of 47% (see the 
left screenshot of Figure 41 ). The proofs from Table [4] also form a smaller cluster, but with frequency of 



7%. Therefore, both clusters are shown if the frequency parameter is 1, but we also have an option of 
choosing a higher frequency 2 or 3 to discard the second, less significant, cluster. 

This is a typical situation, small values of the granularity parameter usually produce big clusters with 
small frequencies. When the granularity value is increased, the big clusters are split into smaller ones 
with high frequencies. Note the interactive nature of this proof-mining process. 



Example 42 A similar effect of increasing granularity parameter and increasing frequencies for the 
tactic-level proof features is shown in Figure 41 The Figure also demonstrates that data- mining the 
same library using goal-level features and tactic-level features can bring different results. Interestingly, 
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SS schema 



Element Form Default qualified 



(clusters ] e (•^O' 



^- (cluster^ G (^G- 



frequency 
Type nat 



Lem ma 
^Type nat ^ 



Fi gure 43; XML schema for handling results produced by different machine learning engines. An XML file following this 
schema will consists of a main node called clusters, which has as child a sequence of pairs. The first element of the pair is 
a node called cluster whose child is the sequence of lemmas belonging to the cluster. The second element of the pair is the 
frequency of the cluster. 



with increase of granularity, the goal-level clustering focuses on the examples related to lemmas in Table 
[3] whereas the tactic-level clustering focuses on examples related to lemmas of Table |4| as we conjec- 
tured in items * and ^ of Section [2l 

We finish this section with a discussion of the role of this statistical analysis in our approach to the 
light backwards interfacing. ML4PG handles the results obtained with Matlab and Weka in a uniform 



way: for this purpose, we devised an XML format, see Figure 43 Using this approach, ML4PG can deal 
with the output generated by any system which follows this XML standard using just one program which 
transforms the XML files into a suitable format for the user. As a consequence, ML4PG can be easily 
extended with new engines and machine learning algorithms. 

The XML files returned by the machine learning engines are processed in two different ways de- 



pending on the mode of using ML4PG: that is, general clustering (as illustrated in Example 30 1 or 



goal-dependent clustering (as shown in Example 32 1. In both cases, the XML file is converted into a 



list of pairs where the first element of the pair contains the lemma names and the second element the 
frequency of each cluster. In the general clustering case, such a list is processed to be shown as in Fig- 
ure 31 For the goal-dependent clustering, ML4PG searches for those pairs of the list where the current 



proof is included. If the cunent proof belongs to several clusters, then ML4PG takes the one with the 



highest frequency and displays it as shown in Figure 33 



6 Conclusions and Further work 

In this paper, we have presented a Proof General extension, called ML4PG, to interface ITPs and machine 
learning engines. Our main goal was to prove that it is possible to interface higher-order interactive 
theorem proving with statistical machine learning; and the resulting tool can provide fast and non-trivial 
proof hints during the proof development. The technical highlights of ML4PG are: 

• the proof trace method is a flexible, extendable technique that gathers statistics from proofs on the 
basis of the relative transformations of simple parameters within several proof steps; and, 

• the light backwards interfacing implemented in ML4PG automates the Proof General interaction 
with machine learning engines. It helps to analyse and interpret the output of machine learning 
algorithms; however, it avoids full translation of the statistical outputs into the prover's language. 

The ML4PG approach has several benefits. First of all, it does not assume any knowledge of machine 
learning interfaces from the user; and automates initial statistical experiments (determining the number 
of clusters, calculating frequencies) that otherwise would have been performed by hand. The choices for 
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various measures of cluster granularity and frequency can be easily extended in the future. Moreover, 
it is a modular tool which allows the user to make choices regarding approach to levels of proofs and 
particular statistical algorithms. By design, it allows further extensions to different machine learning 
environments, modes of supervised/unsupervised learning, and various learning algorithms within those 
modes. In addition, it is tolerant to mixing and matching different proof Ubraries, different notations and 
proof styles used across several developments. 

Comparing across different proof levels and different styles of proofs, our experiments show that 
data-mining the goal-level features shows more interesting clusters compared to the other two feature 
extraction methods. We plan to improve the other two feature extraction methods in the future. Proofs 
in SSReflect yield more consistent classification results compared to the plain Coq style. This is due to a 
stricter proof discipline in SSReflect, which allows ML4PG to detect more significant proof patterns. 

We plan to integrate more machine learning methods to help in the proof process. To this aim, we 
need a tool which tracks not only the successful proofs, but also failed and discarded derivations steps. In 
this way, we could use supervised machine learning algorithms to indicate a user whether he is following 
a sensible strategy based on previous experiences. We consider the integration of tools like R, Mahout 
or Octave where we can find algorithms that do not appear in Weka or Matlab. 

Moreover, we are interested in increasing the number of proof assistants included in our framework. 
This will allow us to study proof similarities across different theorem provers. Since the interaction 
with theorem provers such as Isabelle or Lego is already available in Proof General, we just need the 
implementation of the feature extraction mechanism for them; their interaction with the machine learning 
engines would be the same as developed for Coq and SSReflect. 

Finally, current implementation of ML4PG is centralised; this means that the user can obtain proof 
clusters of the libraries available on his computer. However, we think that a client-server architecture, 
where the proof information is shared among several users could also be useful, especially for team- 
based program development. For this purpose, feature extraction in ML4PG is already designed to be 
lemma name- and notation- independent. 
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